Goto

Collaborating Authors

 two-sample test


Statistically Valid Post-Deployment Monitoring Should Be Standard for AI-Based Digital Health

Neural Information Processing Systems

This position paper argues that post-deployment monitoring in clinical AI is underdeveloped and proposes statistically valid and label-efficient testing frameworks as a principled foundation for ensuring reliability and safety in real-world deployment. A recent review found that only 9% of FDA-registered AI-based healthcare tools include a post-deployment surveillance plan [1]. Existing monitoring approaches are often manual, sporadic, and reactive, making them ill-suited for the dynamic environments in which clinical models operate. We contend that post-deployment monitoring should be grounded in label-efficient and statistically valid testing frameworks, offering a principled alternative to current practices. We use the term "statistically valid" to refer to methods that provide explicit guarantees on error rates (e.g., Type I/II error), enable formal inference under pre-defined assumptions, and support reproducibility--features that align with regulatory requirements. Specifically, we propose that the detection of changes in the data and model performance degradation should be framed as distinct statistical hypothesis testing problems. Grounding monitoring in statistical rigor ensures a reproducible and scientifically sound basis for maintaining the reliability of clinical AI systems. Importantly, it also opens new research directions for the technical community--spanning theory, methods, and tools for statistically principled detection, attribution, and mitigation of post-deployment model failures in real-world settings.


A Semi-Supervised Kernel Two-Sample Test

arXiv.org Machine Learning

We consider the problem of two-sample testing in a semi-supervised setting with abundant unlabeled covariate data. Standard two-sample tests neglect covariate information, which has the potential to significantly boost performance. However, incorporating covariates potentially breaks the exchangeability assumption under the null, which further complicates a calibration procedure. To address these issues, we propose a semi-supervised method that produces a test statistic with asymptotic normality, while effectively integrating additional information from covariates. Our test is straightforward to calibrate due to the asymptotic normality under the null and achieves asymptotic power that is often much higher than existing kernel tests without covariates. Furthermore, we formally show that the proposed method is consistent in power against fixed and local alternatives. Simulations confirm the practical and theoretical strengths of our approach.





CADet: Fully Self-Supervised Anomaly Detection With Contrastive Learning

Neural Information Processing Systems

Handling out-of-distribution (OOD) samples has become a major stake in the real-world deployment of machine learning systems. This work explores the use of self-supervised contrastive learning to the simultaneous detection of two types of OOD samples: unseen classes and adversarial perturbations.